Chapter 3: Linear Regression

Linear Regression

The linear regression model is expressed as follows \begin{align} Y= \beta_0 + \sum_{j=1}^{p}\beta_{j} X_{j}, \end{align}

where $X_j$ represents the j-th predictor and $\beta_j$ quantifies the association between that variable and the response.

In case that $p=2$, $\beta_0$ and $\beta_1$ are known the intercept and slope terms, respectively.

In practice, we cannot identify $\beta_0,~\beta_1,~\ldots,~\beta_p$. Instead, we can have estimates $\hat{\beta}_0,\hat{\beta}_1, \ldots, \hat{\beta}_p$. Given estimates $\hat{\beta}_0$, $\hat{\beta}_1$, $\ldots$, $\hat{\beta}_p$, the following formula can be used for predictions,

\begin{align} \hat{y}=\hat{\beta}_0 +\hat{\beta}_1x_1 +\hat{\beta}_2 x_2 + \ldots +\hat{\beta}_p x_p. \end{align}

Using the least-squares approach, $\beta_0$, $\beta_1$, $\ldots$, $\beta_p$ can be chosen to minimize the sum of squared residuals

\begin{align} RSS=\sum_{i=1}^{n}(y_i-\hat{y}_i)^2=\sum_{i=1}^{n}(y_i-\hat{\beta}_0 - \hat{\beta}_1 x_{i1} - \hat{\beta}_2 x_{i2} - \hat{\beta}_3 x_{i3}-\ldots - \hat{\beta}_p x_{ip})^2. \end{align}

There are a few definitions that we need throughout this document.

$R^2$ Statistic

The $R^2$ statistic is a measure of the linear relationship between $X$ and $Y$. To calculate $R^2$ , we use the formula \begin{align} R^2 =1-\frac{\sum_{i=1}^{n}(y_i − \hat{y}_i) 2}{\sum_{i=1}^{n}(y_i − \bar{y}) 2} \end{align}

Correlation Matrix

It is a matrix in which i-j position defines the correlation between the ith and jth parameter of the given data-set. Defined as correlation

\begin{align} Cor(X,Y )=\frac{\sum_{i=1}^{n}(x_i − \bar{x})(y_i − \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i − \bar{x})^2}\sqrt{\sum_{i=1}^{n}(y_i − \bar{y})^2}} \end{align}

Correlation Matrix is basically a covariance matrix. Also known as the auto-covariance matrix, dispersion matrix, variance matrix, or variance-covariance matrix.

Advertising Example

In this section, we work on the ISLR Advertising dataset which consists of the sales of that product in 200 different markets, along with advertising budgets for the product in each of those markets for three different media: TV, radio, and newspaper. The dataset is available at [3].

Here, $\beta_j$ is interpreted as the average effect on $Y$ of a one unit increase in $X_j$, holding all other predictors fixed. We have $$sales = \beta_0 + \beta_1 × TV + \beta_2 \times radio + \beta_3 \times newspaper + \epsilon. $$

$\beta_0$, $\beta_1$, $\beta_2$ and $\beta_3$

As can be seen,

RSS:

Correlation Matrix and plot:

We can see that the correlation between radio and newspaper is about 0.35. In other words, we can see a tendency to spend more on newspaper advertising in markets where more is spent on radio advertising.

Since p-value of the newspaper advertising is insignificant, we can provide a 3D diagram with sale, TV, and Radio advertising data.

$$sales = \beta_0 + \beta_1 × TV + \beta_2 \times radio$$

Note that we do not have to define our linear model in additive way. For example we can define a linear model that uses radio, TV, and an interaction between the two to predict sales takes the form \begin{align} sales &= \beta_0 + \beta_1 \times TV + \beta_2 \times radio + \beta_3 \times ( radio \times TV ) + \epsilon\\ &= \beta_0 + (\beta_1 + \beta_3 \times radio ) \times TV + \beta_2 \times radio + \epsilon. \end{align}

The results clearly suggest that the model that includes the interaction term is superior to the model that contains only the main effects. Since the p-value for the interaction term, TV x radio, is noticeably low, it is clear that the true relationship is not additive.

Credit Example

The Credit dataset contains the balance (average credit card debt for a number of individuals) as well as several quantitative predictors: age, cards (number of credit cards), education (years of education), income (in thousands of dollars), limit (credit limit), and rating (credit rating).

This dataset can be extracted from the ISLR package using the following syntax.

library (ISLR)
write.csv(Credit, "Credit.csv")

The Credit data set contains information about balance, age, cards, education, income, limit, and rating for a number of potential customers.

Each panel of the Figure is a scatterplot for a pair of variables whose identities are given by the corresponding row and column labels.

For example, the scatterplot directly to the right of the word “Balance” depicts balance versus age , while the plot directly to the right of “Age” corresponds to age versus cards . In addition to these quantitative variables, we also have four qualitative variables: gender , student (student status), status (marital status), and ethnicity (Caucasian, African American or Asian).

Predictors with Only Two Levels

We simply create an indicator or dummy variable that takes on two possible dummy variable numerical values (binary values).

For example, based on the gender variable, we can create a new binary variable that takes the form $$x_i =\begin{cases}1, & \mbox{if ith person is female}, \\0, & \mbox{if ith person is male},\end{cases}$$

and the regression equation:

$$y_i = \beta_0 + \beta_1 x i + \epsilon_i =\begin{cases}\beta_0 + \beta_1 + \epsilon_i, & \mbox{if ith person is female}, \\\beta_0 + \epsilon_i , & \mbox{if ith person is male.}\end{cases}$$
$$y_{i} = \beta_0 +\beta_1 x_{i1} +\beta_2 x_{i2} +\epsilon_i = \begin{cases} \beta_0 +\beta_1 +\epsilon_i , & \mbox{if ith person is Asian},\\ \beta_0 +\beta_2 +\epsilon_i, & \mbox{ if ith person is Caucasian},\\ \beta_0 +\epsilon_i , & \mbox{if ith person is African American}. \end{cases}$$

$$y_i = \beta_0 + \beta_1 x i + \epsilon_i =\begin{cases}\beta_0 + \beta_1 + \epsilon_i, & \mbox{if ith person is Married}, \\\beta_0 + \epsilon_i , & \mbox{if ith person is not Married.}\end{cases}$$

There is an interaction term between income and student. We have

\begin{align} \text{balance}_i \approx \beta_0 + \beta_1 \times \text{income}_i + \begin{cases} \beta_2, & \mbox{if ith person is a student},\\0, & \mbox{if ith person is not a student} \\ \end{cases} =\beta_1 \times \text{income}_i + \begin{cases} \beta_0 + \beta_2, & \mbox{if ith person is a student},\\ \beta_0, & \mbox{if ith person is not a student} \\ \end{cases} \end{align}

\begin{align} \text{balance}_i &\approx \beta_0 + \beta_1 \times \text{income}_i + \begin{cases} \beta_2+\beta_3 \times \text{income}_i, & \mbox{if ith person is a student},\\0, & \mbox{if ith person is not a student} \\ \end{cases}\\ &=\begin{cases} (\beta_0+\beta_2)+(\beta_1+\beta_3) \times \text{income}_i, & \mbox{if ith person is a student},\\ \beta_0 + \beta_1 \times \text{income}_i, & \mbox{if ith person is not a student} \\ \end{cases} \end{align}

Regression 1 - without interaction term:

Regression 2 - with interaction term:


Refrences

  1. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (Vol. 112, pp. 3-7). New York: springer.
  2. Jordi Warmenhoven, ISLR-python
  3. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2017). ISLR: Data for an Introduction to Statistical Learning with Applications in R